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Abstract 

In this paper, first a great number of inverse problems wliicli arise in 
instrumentation, in computer imaging systems and in computer vision are 
presented. Tlien a common general forward modeling for them is given 
and the corresponding inversion problem is presented. Then, after show- 
ing the inadequacy of the classical analytical and least square methods 
for these ill posed inverse problems, a Bayesian estimation framework is 
presented which can handle, in a coherent way, all these problems. One 
of the main steps, in Bayesian inversion framework is the prior modeling 
of the unknowns. For this reason, a great number of such models and 
in particular the compound hidden Markov models are presented. Then, 
the main computational tools of the Bayesian estimation are briefly pre- 
sented. Finally, some particular cases are studied in detail and new results 
are presented. 

1 Introduction 

Inverse problems arise in many applications in science and engineering. The 
main reason is that, very often we want to measure the distribution of an un- 
observable quantity /(r) from the observation of another quantity g{s) which 
is related to it and accessible to the measurement. The mathematical relation 
which gives g{s) when /(r) is known is called forward problem: 

9is)^[nf{r)]{s)+e{s) (1) 

where Ti. is the forward model. In this relation, r and s may represent cither 
time t, position on a line x, position on a surface r = {x,y), position in space 
r = (.T, y, z) or any combinations of them. 



1 



This forward model is often non linear, but it can be linearized. So, in this 
paper, we only consider the linear model, which in its general form, can be 
written as 

g{s)= J h{r,s)f{r)dr + eis) (2) 

where h(r, s) represents the measuring system response and e(s) all the errors 
(modeling, linearization and the other unmodelled errors often called noise) . In 
this paper, we assume that the forward model is known perfectly, or at least, 
known excepted a few number of parameters. The inverse problem is then 
the task of going back from the observed quantity g{s) to f{r). The main 
difficulty is that, very often these problems are ill-posed, in opposition to the 
forward problems which are well-posed as defined by Hadamard [1] . A problem 
is mathematically well-posed if the problem has a solution (cxistance), if the 
solution exists (uniqueness), and if the solution is stable (stability). A problem 
is then called ill-posed if any of these conditions are not satisfied [2j . 

In this paper, we will only consider the algebraic methods of inversion where, 
in a first step the forward problem is discretized, i.e.,, the integral equation is 
approximated by a sum and the input /, the output g and the errors e are 
assumed to be well represented by the finite dimcntional vectors /, g and e 
such that: 

n 

gi ^ H^jfj + e^, z = 1, • • • , n > g = Hf + e (3) 

where g^ = g{si), e,; = e(si), = /(r^) and Hij = h{rj,Si) or in a more general 
case 

gi = < (t)i{s),g{s) >^ j (t)i{s) g{s)ds 

Ci = < (j)i{s),e{s) >= J (j)^{s) e{s)ds (4) 

/, ^ <yjj{s)J{r)>= Jip,{r)f{r)dr 

where (t>i{s) and are appropriate basis function in their corresponding 

spaces which means that, we assume 

m 

i=l 
m 

i=l 
n 

fir) ^ (5) 
Hij ~ < <i)i[s),%l:j{s) >= j j ipj{r) (l)i[s)drds 
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But, before going further in details of the inversion methods, we are going to 
present a few examples. 



1.1 ID signals 

Any instrument such as a thermometer which tries to measure a non directly 
measurable quantity f{t) (here the time variation of the temperature) trans- 
forms it to the time variation of a measurable quantity g{t) (here the length of 
the liquid in the thermometer). A perfect instrument has be at least lincair. 
Then the relation between the output g{t) and the input f{t) is: 



g{t)= / h{t,t')f{t')dt + e{t) 



(6) 

where h{t,t') the instrument's response. If this response is invariant in time, 
then we have a convolution forward model: 



git) =Jh{t- t')f{t') At + e{t) (7) 
and the corresponding inverse problem is called deconvolution. 




? 




Figure 1: Deconvolution of ID signals. 
The convolution equation ([7]) can also be written 

g{t)^ I h{T)f{t~T)dT + e{t) 



(8) 



which is obtained by change of variable t — t' = t. Assuming the sampling 
interval of f, h and g to be equal to 1, the discretized version of the deconvolution 
equation can then be written: 



7(z)=^/^(fc)/(j-fc) + eW, z = l,---,T 



(9) 



which can be written in the general vector-matrix form: 

g = Hf + e 



(10) 



where g and / contains samples of the ouput g(t) and the intput f{t) and the 
matrix H, in this case, is a Toeplitz matrix with a generic lignc composed of the 
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samples of the impulse response h{t). The Toeplitz property is thus identified 
to the time invariance property of the system response (convolution forward 
problem) . 



1.2 Image restoration 

In this paper, we consider more the case of bivariate signals or images. As an 
example, when the unknown and measured quantities arc images, we have 



(11) 



g{r) = J h{r,r')f{r')dr' + e{r) 
and if the system response is spatially invariant, we have 
5(r)= / h{r-r')f{r') + e{r). 



(12) 



The case of dcnoising is the particular case where the point spread function 
(psf) h{r) is h{r) — S{r): 



g{r) = f{r) + e{r) 



(13) 





Figure 2: Image restoration as an inverse problem. 

The discretized version of the 2D dcconvolution equation can also be written 
as g = Hf + £ where g and / contains, respectively, the rasterized samples 
of the ouput g{r) and the intput f{r'), and the matrix H in this case, is 
a huge dimensional Toeplitz-Bloc- Toeplitz (TBT) matrix with a generic bloc- 
ligne composed of the samples of the point spread function (PSF) h{r). The 
TBC property is thus identified to the space invariance property of the system 
response (2D convolution forward problem). For more details on the structure 
of this matrix refer to the book [3] and the papers [H [5l [6] . 
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1.3 Image reconstruction in computed tomography 

In previous examples, g{s) and /(r) where defined in the same space. The case 
of image reconstruction in X ray computed tomography (CT) is interesting, 
because the observed data g{s) and the unknown image /(r) are defined in 
diS'erent spaces. The usual forward model in CT is shown in Figure 

In 2D case, the relation between the image to be reconstructed f{x,y) and 
the projection data g{r, 4>) = g^(r) is given by the Radon transform: 

9ir,4>) = / f{x,y) dl + £{r,4>) 




f{x, y) S{r — X cos (/) ~ V sin 0) dxy + e(r, 0) (14) 



The discretized version of this forward equation can also be written as g = 
Hf + e where g = [qi, ■ ■ ■ , g^^] contains samples of projection datas g{r^ (p^) for 
different angles (f)^, k ^ 1, • • • , /v , / = {/(f"), r G Tl} contains the image pixels 
put in a vector and the elements Hij of the matrix H, in this case, represents 
the length of the z-th ray in the j-th pixel. This matrix is a very sparse matrix 
with great number of zero valued elements [3 [S] . 

3D 2D 



Projeclions 




Forward probelm: f{x,y) or f{x,y, z) > g^{r) or g^{ri,r2) 

Inverse problem: g,j,{r) or g^{ri,r2) — > f{x,y) or f{x,y,z) 



Figure 3: 2D and 3D X ray computed tomography 
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1.4 Time varying imaging systems 

When the observed and unknown quantities depend on space r and time i, we 
have 

g{r, t)^ j h{r -r',t- t')f{r', t') dr' At' + e(r, t) (15) 

If the point spread function of the imaging system does not depend on time, 
then we have 



g{r,t) = j h{r ~r')f{r' ,t)dr' + e{r,t) (16) 
In this case, t can also be considered as an index: 

9t{r) = Jh{r^ r')ft{r') dr' + et{r) (17) 

One example of such problem is the video image restoration shown in Figure ^ . 




1 I 




Figure 6: Inverse problem of video image restoration 

The discretized version of this inverse problem can be written as 

g, = Hf, + et (18) 

where and contains samples of the ouput gt{r) and the intput ft{r') and 
the matrix H , in this case, is again a Toeplitz-Bloc-Toeplitz (TBT) matrix with 
a generic bloc-ligne composed of the samples of the point spread function (PSF) 
h{r). 

1.5 Multi Inputs Multi Outputs inverse problems 

Multi Inputs Multi Outputs (MIMO) imaging systems can be modeled as: 

N 

g^{s)=Y, j K,{s,r)f,{r)dr + e,{r), i^l,---,N (19) 
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1.5.1 MIMO sources localisation and estimation 

One such example is the case where n radio sources {fj(t),j = 1, • • • , n} emit- 
ting in the same time are received by m receivers {gi{t),i = l,---,m}, each 
one receiving a hnear combination of delayed and degraded versions of original 
waves: 

N 

gm = E / ^^^(^ - - ) + i^i,---,N (20) 

3 = 1-' 

where hij (t) is the impulse response of the channel between the i-th receiver and 
the j-th source. The discretizcd version of this inverse problem can be written 
as 

g,^H,,,f^+e, (21) 

where and fj contains samples of the ouput gi (t) and the intput ft (t) and the 
matrices Hij are Toeplitz matrices described by the impulse responses hi_j{t). 

1.5.2 MIMO deconvolution 

A MIMO image restoration problem is : 

= E / h.,{r~r')f,{r')dr' + e,ir) (22) 
j 

and one such example is the case of color image restoration where each color 
component can be considered as an input. 




Figure 7: Color image restoration as an example of MIMO inverse problem. 
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1.6 Source Separation 

A particular case of a MIMO inverse problem is the blind source separation 
(BSS): 

9^{r) ^'^'^^^(^ - r')f,{r') dr' + e.(r) (23) 

and a more particular one is the case of instantaneous mixing: 

g,(r)=^A,,/,(r) + e,(r) (24) 

3 

The particularity of these problems is that the the mixing matrix A ~ {Ay } is 
also unknown. 




Figure 8: Blind image separation. 



1.7 Multi Inputs Single Output inverse problems 

A Multi Inputs Single Output (MISO) system is a particular case of MIMO 
when we have only one input: 

3 
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1.7.1 MIMO sources localisation and estimation 

One example of MISO inverse problem is a non destructive testing (NDT) for 
detection and evaluation of the defect created due to an impact on a surace 
of an object using microwave imaging where two images are obtained when 
a rectangular waveguide scans this surface two times. In the first scan the 
rectangular waveguide is oriented in shorter side and in the second case in 
longer side. By this way, two images gi{r),i = 1,2 are obtained, each has to be 
considered as the output of a linear system with the same input /(r) and two 
different channels. This is a MISO linear and invariant systems. 

1.7.2 Image super-resolution as a MISO inverse problem 

Another MISO system is the case of Super-Resolution (SR) imaging using a few 
Low Resolution (LR) images obtained by low cost cameras: 

j 

where gi are the LR images and / is the desired High Resolution (HR) image. 
The functions hi represent a combination of at least three operations: i) a low 
pass filtering effect, ii) a mouvement (translational or with rotation and zooming 
effects) of the camera and iii) a sub-sampling. 
The following figure shows one such situation. 




Figure 9: SR problem where a scric of LR images arc used to construct a HR 
image. 

The discretized version of this inverse problem can be written as 

g^^H,.,f + e, (27) 
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where and / contains samples of the ouput gi{t) and the intput ft{t) and the 
matrices Hij are Toephtz matrices described by the impulse responses hij(t). 



1.8 Multi modality in CT imaging systems 

Using different modalities has become a main tool in imaging systems where to 
explore the internal property of a body one can use X rays, ultrasounds, mi- 
crowaves, infra-red, magnetic resonance, etc. As an example, in X ray imaging, 
the observed radiographics give some information on the volumique distribution 
of the material density inside the object while the ultrasoimd echography gives 
information on the changing positions (contours) of ultrasound properties inside 
the object. One can then want to use both techniques and use a kind of data 
fusion to obtain a higher quality of images of the body. An example of such 
situation is given in ([H 








Figure 10: Multi modality in CT imaging systems (a) Original objet, (b) Con- 
tours of the different homogeneous regions, (c) Data acquisition geometry in X 
ray tomographic, (d) Data acquisition geometry in ultrasound echography, (e) 
Observed data (sinogram) in X ray tomographic, (f) Observed data in ultra- 
sound echographie. 



1.9 Fusion of X ray and ultrasound echography. 

An example of multimodality and data fusion in CT is the use of X ray radio- 
graphic data and the ultrasound echographie data is shown in Figure (fTTj) and 
for more details on this application sec [HI [13 Ell HI] • 
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Figure 11: Inverse problem of X ray and ultrasound data fusion. 

2 Basics of deterministic inversion methods 

To illustrate the basics of the inversion methods, we start by considering the 
case of a Single Input Single Output (SISO) linear system: 

g = Hf + e (28) 

The idea can be easily extended to the case of MISO or MIMO. For an extend 
details to these methods refer to [131 E! ■ 

2.1 Match filtering 

First assume that the errors and measurement noise are negligeable and that we 
could choose the basis functions a-nd ijjj could be choosed in such a way that 
the matrix H is square (to = n) and self-adjoint {H' H = I) (un unrealistic 
hypothesis). Then, the solution to the problem would be: 

/ = H'g (29) 

This solution has been used in many cases. For example in deconvolution, this 
solution is called Matching filtering. The main reason is that, in a deconvolution 
problem, the matrix if is a Toeplitz matrix, so is its transpose H' . The forward 
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matrix operation Hf corresponds to a convolution conv(/i, /). The adjoint 
matrix operation H'g then also corresponds to a convolution convCh, g) where 
h(t) = h{-t). 

Another example is in computed tomography (CT) where the projection 
data in each angle direction is related to the image / through a projecting 
matrix in that direction Hi such that wc can write: 



9i 




Hi 












f + 




. 9k . 











and the adjoint operation: 

K 

l = H'g = Y,H'^9 (31) 

fe=i 

corresponds to what is called backprojection. 

However, as it is mentionned, the hypothesis made here are unrealistic. 



2.2 Direct inversion 

The next step is just to assume that the forward matrix is invcrtiblc. Then, one 
can try to define the solution as: 

? = H 'g (32) 

But, in practice, this also is an illusion, because, even if the matrix H is math- 
ematicaly invcrtiblc, it is, very often, very ill-conditionncd. This means that 
small errors on the data dg will generate great errors 6f on the solution. This 
method, in deconvolution, corresponds to the analytical method of inverse fil- 
tering, which is, in general, unstable. 

In other applications, the main difficulty is that, very often the matrix H is 
even not square, i.e.,m ^ n, because the number of the measured data m may 
not be equal to the number of parameters n describing the unknown function / 
in ©. 

2.3 Least square and generalized inversion 

For the case where to > n, a solution will be the least square (LS) defind as: 

/ = argmm{l|g-Jf/f } (33) 

which results to the normal equation: 

[H'H]J = H'g (34) 
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and if the matrix H'H is inversible (rang (Jf'i?) = n), then the sohition is 
given by 

/ = [H'H]^H'g (35) 

When TO < n, the problem may have an infinite number of solutions. So, we 
may choose one of them by requesting some particular a priori property, for 
example to have minimum norme. The mathematical problem is then: 

/ = arg min {||/f} (36) 
{Hf=g} 

or written differently 

minimize ||/||^ subject to Hf=g (37) 

The solution is obtained via the Lagrange multiplier method which, in this case, 
results to 




which gives 

f, = H\HH')-^g (39) 

if HH* is invertible. 

The main difficulty in these methods is that the solution, in general, is too 
sensitive to the error in the data due to the ill conditionning of the matrices to 
be inverted. 



2.4 Regularization methods 

The main idea in regularization theory is that a stable solution to an ill-posed 
inverse problem ca nnot be obtained only by minimizing a distance between 
the observed data and the output of the model, as it is for example, in LS 
methods. A general framework is then to define the solution of the problem as 
the minimizer of a compound criterion such as: 

/ = argmm{J(/)} (40) 

with 

J(/) = Ai(g,ff/) + AA2(/,/o) (41) 

where Ai and A2 are two distances, the first defined in the observed quantity 
space and the second in the unknown quantity space. A is the regularization 
parameter which regulates the compromize with the two terms and /q is an a 
priori solution. Un example of such criterion is 

J{f) = \\g-Hfr + \\\f-f,r (42) 

which results to 

f = f, + [H'H + XI]-'H'{g-Hf,) (43) 
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We may note that the condition number of the matrix to be inverted here can be 
controlled by appropriately choosing the value of the regularization parameter 
A. 

Even if the methods based on regularization approach have been used with 
success in many applications, three main open problems still remains: i) Deter- 
mination of the regularization parameter, ii) The arguments for choosing the 
two distances Ai and A2 and iii) Quantification of the uncertainties associated 
to the obtained solutions. Even if there have been a lot of works trying to an- 
swer to these problems and there are effective solutions such as the L-curve or 
the Croos Validation for the first, the two others are still open problems. The 
Bayesian estimation framework, as we will see, can give answers to them |15| . 



3 Bayesian estimation framework 

To illustrate the basics of the Bayesian estimation framework, let first consider 
the simple case of SISO system g = Hf + e where we assume that H is known. 
In a general Bayesian estimation framework, the forward model is used to define 
the likelihood function p{g\f, 61) and we have to translate our prior knowledge 
about the unknowns / through a prior probability law p{f [62) and then use the 
Bayes rule to find an expression for p{ f\g, 6) 

.uM^mMMM (44) 

where p{g\f,Oi) is the likelihood whose expression is obtained from the for- 
ward model and assumption on the errors e, 6 = {81,02) represents all the 
hyperparameters (parameters of the likelihood and priors) of the problem and 

p{g\e) = j p{g\f,e^) p{f 102) df (45) 

is called the evidence of the model. 

When the expression oi p{f\g,6) is obtained, we can use it to define any 
estimates for /. Two usual estimators are the maximum a posteriori (MAP) 

/ = argmax{p(/|sr, 9)} (46) 

and the Mean Square Error (MSE) estimator which corresponds to the posterior 
mean 

/ = / fp{f\g,0)df. (47) 



Unfortunately only for the linear problems and the Gaussian laws where p{ f\g, 0) 
is also Gaussian we have analytical solutions for these two estimators. For al- 
most all other cases, the first one needs an optimization algorithm and the 
second an integration one. For example, the relaxation methods can be used 
for the optimization and the MCMC algorithms can be used for expectation 
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computations. Another difficult point is that the expressions ofp{g\f,6i) and 
p(/l^2) and thus the expression of p{f\g,6) depend on the hyperparameters 
which, in practical applications, have also to be estimated either in a supervised 
way using the training data or in an unsupervised way. In both cases, we need 
also to translate our prior knowledge on them through a prior probability p{0). 
Thus, one of the main steps in any inversion method for any inverse problem 
is modeling the unknowns. In probabilistic methods and in particular in the 
Bayesian approach, this step becomes the assignment of the probability law 
p{f\di)- This point, as well as the assignment of p{6), are discussed the next 
two subsections. 



3.1 Simple case of Gaussian models 

Let consider as a first example the simple case where e and / arc assumed to 
be Gauusian: 



(X exp — 2^e*e 
p{fW},Po) = Af{fo,Rf = a}Po) 



oc cxp 

Then, it is esay to to show that: 



hif-foYPo'if-fo 



oc exp 



-^{g-HfY{g-Hf] 



M{Hf^,HRjH' + R, 



p{g,fW^,cF%Pa) oc cxp 



^(g-HfYig-Hf) 
-2%(/-/oW(/-/o) 



and 



with 



p{f\9,ol,a'],Po) 



o(gJW'l<y),Po) 
p{g\ai,a},Po) 



f = fo + RfH'{HRfH' + R,)-Hg-Hfo) 

= PH'R:\g Hf,), 

P = Rf RfH'{HRfH' + R,)~^HRf 

= {RJ^ + H*R;^H)-\ 

When /q = and noting by A = these relations write: 



(48) 



(49) 



(50) 



(51) 



WH + XPn 



H'g = PH'g 



(52) 
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It is noted that, in this case, all the point estimators such as the the MAP, the 
posterior mean or posterior median are the same and can be obtained by 

/ argmax{p(/|sr)} = argnnn {- lnp(/|g)} = argnun{J(/)} (53) 



with 

J(/) = ||g-ff/f + A(/*P„-V) 
Three particular cases are of interest: 



(54) 



• Pq = I. This is the case where fj are assumed centered, Gaussian and 
i.i.d.: 



p{f) oc exp 



2o) 



cx exp 



(55) 



• Pq — CC . This is the case where fj are assumed centered, Gaussian but 
correlated, the vector / is then considered to be obtained by: 



with C corresponds to a moving average (MA) filtering and p(^) 
In this case, we have: 



p(/) « exp 



oc exp 



1 

2^ 



ic/ll 



(56) 



(57) 



• Pq = [D^D) ^ = [I — A) ^. This is the case where fj are assumed 
centered, Gaussian and autoregressive: 



with A a matrix obtained from the AR coefRcients and p{$,) 
In this case, we have 



p{f) oc exp 



A particular case of AR model is the first order Markov chain 

p(/.l/,)-AA(/,_i,a2.) 
with corresponding A and D = I A matrices 



(58) 
A/-(0,7). 

(59) 
(60) 



A = 



. 

1 . 

1 

. . 



1 



1 . . 
-1 1 . . 
0-11. 







-1 1 



(61) 
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which give the possibiHty to write 



p{f) (X exp 



1 
2^ 



2.Df\\ 

f 



oc exp 



"2^2 ^(/j fj-i) 
J i 



(62) 



These particular cases give us the possibility to extend the prior model to other 
more sophisticated non-Gaussian models which can be classified in three groups: 



• Separable: 



p{f) oc exp 



(63) 



where </> is any positive valued function. 
• Simple Markovian: 



p{f) oc exp 



(64) 



where cj) is any positive valued function called potential function of the 
Markovian model. 



Compound Markovian: 



p{f\c) oc exp 



(65) 



where (f> is any positive valued function whose expression depends on the 
hidden variable c. 



Some examples of the expressions used in many applications are: 



(/)(t) = <^<^; \tf,l<P<2; -tlnt+l,t>0; min(t^l); 



-1 



l+t2 



(66) 



These equations can easily be extended for the case of multi-sensor case. 

However, even if a Gaussian model for the noise is acceptable, this model is 
rarely realistic for most real word signals or images. Indeed, very often, a signal 
or an image can be modeled locally by a Gaussian, but its energy or amplitude 
can be modulated, i.e.; piecewise homogeneous and Gaussian [TBI [TTl [TSl fT9] . 
To find an appropriate model for such cases, we introduce hidden variables and 
in particular hidden Markov modeling (HMM). In the following, we first give a 
summary description of these models and then we will consider the general case 
of MIMO systems with prior HMM modeling. 
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3.2 Modeling using hidden variables 
3.2.1 Signal and images with energy modulation 

A simple model which can capture the variance modulated signal or images is 

p(/j|dj,A) =A/'(0,2dj) and | A) = 5(3/2, A) (67) 
where Q is a Gamma distribution. It is then easy to show the following relations: 



p{f.d\\) 
P{9\f) 



oc exp 
oc exp 



g 



-1 

2ov 



4 + 



(68) 



and 



p{f,d\g) oc exp[-J(/,d)] 

with J{f,d) = ^Wg-Hff 



(69) 



If we try to find the joint MAP estimate of the unknowns (/, d) by optimisation 
successively with respect to / when d is fixed and with respect to d when / is 
fixed, we obtain the following iterative algorithm: 



with D = diag [1/(4^2), j ^ 1, ... ,n] 
/.72 



(70) 



3.2.2 Amplitude modulated signals 

To illustrate this with applications in telecommunication signal and image pro- 
cessing, we consider the case of a Gaussian signal modulated with a two level or 
binary signal. A simple model which can capture the variance modulated signal 
or images is 

pifj\z,,X) = M{z„2/\) 

with e {toi 0, m2 = 1}, (71) 

P(zj=mfc) = (1/2),A:= 1,---,X = 2 

It is then easy to show the following: 







(1/2)[AA(0,2/A)+AA(1,2/A) 






Eti(l/2)AA(m,,a2 


= 2/A) 


p{f\z,X) 


OC 


exp 






P{f]\Zj,X) 


oc 


exp 


-Kfj-z,?] 




p(.z\f,\) 


oc 


exp 






P{z,=k\f„X) 


oc 


exp 






P{9\f.<^e') 


oc 


exp 
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and 

A) oc cxp[-J{f,z)] 

with Jif.z) = _i^||g_Jf/||2+A||/-^i|2 (73) 

+ Mi/2)EfcE,<^fe-'"fe) 

where z = [zi, • • ■ , zjq\ . 

Again, trying to obtain the JMAP estimate (/, 2) by optimizing successively 
J[f,z) with respect to / and z we obtain: 



/ = {a,-^H'H + \I)-^[H'g + Xz] 
1 /, > a 
fj <a 



(74) 



where the threshold a is a function of A. 



3.2.3 Gaussians mixture model 

The previous model can be generalized to the general mixture of Gaussians. We 
then have the following relations: 

P{.f]\zj = k,mk,Vk) = 7V(TOfe,Wfc = 2/Afc) 



P{z, = k) 
p{fj\Trk,mk,Vk) 

p{f\z,m,X) 

P(sl/,c^e^■f^^) 
and thus: 

p{z\f,m,X) 



= TTfe Zj e {1, ■ • • , if} 



oc exp 
oc exp 
oc exp 



- J2j Efc ^kSizj - k){fj - ruk 



(75) 



p{f\z,m,\)Y[^'K^^ 



oc exp 



P{zj ~ k\f ,m, X) (x exp 



- J2j Ekl^kSizj - k){fj - nikf + InTTfc] 

-Afc(/j - TUkf + InTTfc] 



(76) 



and 
with 



Ag, X) oc exp [- J(/, z) 



ihWa Hff + Ek E{,:.,=fc} Afe(/, - mkY 
+ Efe ln(7rfe) E, ^{zj - "ifc) 
5^||g-ff/r + EfcAfc||/fc-mfcl|p 
+ Efc "fe ln(7rfc) 



(77) 



where m = {mi, • • • , toa:}, A = {Ai, • • • , Xk}, nk — Ej ^i^j ~ k) ^^l*^ number 
of samples fj which are in the class Zj = k and /fc = {fj : Zj = k}. For more 
details and apUications of such modeling see [221 Ell EH E3 • 
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3.2.4 Mixture of Gauss-Markov model 

In the previous model, we assumed that the samples in each class are indepen- 
dent. Here, we extend this to a markovian model: 

Pif]\zj = k,Zj-i k, fj_i,mk,Vk) = J\f{mk,Vk) 

p{fj\zj =k,Zj_i = k,fj_i,mk,Vk) = M{fj-i,Vk) (78) 

P{zj^k) = TTfe Zje{lr--,K} 

which can be written in a more compact way if we introduce qj = l^S{zj ~ Zj^i) 

by 

P{f3\(ljjj-i,'mk,vk) =J\f{qjmk + (1 - qj)fj-i,Vk) (79) 
which results to: 

p{f\z,'rn,\)(xexp 
occxp 



- Ek ^kS{z, - k)[f, - {q.ruk + (1 - g,)/.-i)] 

- Ek ^k6iz, - km - q,){f, - f,^,r + qjifj - mkf 

(80) 

and when combined with 

1 



P{g\f^(^e^) « exp 



2a, 



\\9-Hf\ 



gives: 

with 

J{f,z) 



P{f, z\g, G^, m, A) oc exp [- J(/, z) 



j^\\g-Hff 

+ % Efc Afc5(z, - k)[f, - {q,mk + (1 - q-j)f,-i)? 
+ T.k^k ln(7rfc) 

2^119 - Hff + - qi){h - h-i? + Efc"/cln(^fc) 



(81) 



where /-,- = ^zjifj ~ i^zj), D is the first order finite difference matrix and Q is 
a matrix with qj as its diagonal elements. 

A particular case of this model is of great interest: = 0,Vfc and = 
A, V/e. Then, we have: 



P{f]\qjj]-i,mk,vk) 
p{f\q,m,X) 



N{{l-qj)fj-i,Vk) 



(X exp 
cx exp 



-E,Hfj-ii-q,)f,-i)] 
-AE,[(i-9.)(/.-/.-i)' + 'z./: 



(82) 



and 



Pif, q\a, o-e^, rn, A) cx exp [- J(/, q) 
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with 



J{f,Q) = ^\\9-Hfr + \j:Mi-qm-f,-i? + q,ff] 

+ (83) 
= ^\\g- Hff + XWQDff + Efc ln(a,) 

where = is the number of discontinuities (length of the contours in 

the case of an image) ak = P{qj ~ 1) and 1 — Uk — P{qj — 0). 

In all these mixture models, we assumed Zj independent with P{zj ^ k) = 
TTfe. However, Zj corresponds to the label of the sample fj. It is then better 
to put a markovian structure on it to capture the fact that, in general, when 
the neighboring samples of fj have all the same label, then it must be more 
probable that this sample has the same label. This feature can be modeled via 
the Potts-Markov modeling of the classification labels Zj. In the next section, 
we use this model, and at the same time, we extend all the previous models to 
2D case for applications in image processing and to MIMO applications. 

3.3 Mixture and Hidden Markov Models for images 

In image processing applications, the notions of contours and regions are very 
important. In the following, we note hy r — {x, y) the position of a pixel and 
by f{r) its gray level or by f{r) = {/i(t"), • • • , /jv('")} its color or spectral 
components. In classical RGB color representation iV = 3, but in hyperspectral 
imaging N may be more than one hundred. When the observed data are also 
images we note them by g[r) ~ {gi{i'), ■ ■ ■ ,gM{r)}- For any image fj{r) we 
note by qj{r), a binary valued hidden variable, its contours and by Zj{r), a 
discrete value hidden variable representing its region labels. We focus here on 
images with homogeneous regions and use the mixture models of the previous 
section with an additional Markov model for the hidden variable Zj{r). 

3.3.1 Homogeneous regions modeling 

In general, any image fj{r),r E TZ is composed of a finite set Kj of homo- 
geneous regions Rj^, with given labels Zj{r) = k,k ^ l,---,Kj such that 
Rji^ = {r : Zj{r) = k}, TZj = UkRjf. and the corresponding pixel values 

Afe ~ {/;■('") • ^ S ^Jk} ^^'^ fj " ^^fjy "^^^ Hidden Markov modehng 
(HMM) is a very general and efficient way to model appropriately such images. 
The main idea is to assume that all the pixel values fj^ = {fj{r), r e Rj/^} of 
a homogeneous region k follow a given probability law, for example a Gaussian 
N{nijj^l, '^ji,) where 1 is a generic vector of ones of the size n^j. the number of 
pixels in region k. 

In the following, we consider two cases: 

• The pixels in a given region are assumed iid: 

p{f,{r)\z,{r) =k)= N{m,^,a]^), = 1, • • • , i^, (84) 
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and thus 

p(/^.Jz,(r) = fc) =p(/,(r),r e R, ^ Af{m, ,l,a] ^I) (85) 

This corresponds to the classical separable and monovariate mixture mod- 
els. 

• The pixels in a given region are assumed to be locally dependent: 

Pifj^r) = k)= piMr),r G R,,) = Mim.^l, £,,) (86) 

where Sj j, is an appropriate covariance matrix. This corresponds to the 
classical separable but multivariate mixture models. 

In both cases, the pixels in different regions are assumed to be independent: 



Ki 



Ki 



(87) 



k=l 



k=l 





f{r): Mixture of iid Gaussian f{r): Mixture of Gauss-Markov 



Figure 12: Mixture and hidden Markov models for images 



3.3.2 Modeling the labels 

Noting that all the models ((84)) . ((85|) and (|86|) are conditioned on the value of 
Zj{r) = fc, they can be rewritten in the following general form 



^Jk^'^Jk) 



(88) 
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where either is a diagonal matrix Sj^, = crj^-f or not. Now, we need also 
to model the vector variables Zj ~ {zj{r),r G TZ). Here also, we can consider 
two cases: 

• Independent Gaussian Mixture model (IGM), where {zj{r),r e TZ] are 
assumed to be independent and 

P{zj{r) = fc) = pk, with ^pfc = 1 and p{zj) = (89) 

fc fc 



• Contextual Gaussian Mixture model (CGM), where Zj = {2;_,(r),r e 7^} 
are assumed to be Markovian 



p[z^) oc exp 



(90) 



which is the Potts Markov random field (PMRF). The parameter a con- 
trols the mean value of the regions' sizes. 



3.3.3 Hyperparameters prior law 

The final point before obtaining an expression for the posterior probability law 
of all the unknowns, i.c, p{f,0\g) is to assign a prior probability law p{6) to 
the hyperparameters 6. Even if this point has been one of the main discussing 
points between Bayesian and classical statistical research community, and still 
there are many open problems, we choose here to use the conjugate priors for 
simplicity. The conjugate priors have at least two advantages: 1) they can 
be considered as a particular family of a differential geometry based family of 
priors [IHIIITIIIH] and 2) they are easy to use because the prior and the posterior 
probability laws stay in the same family. In our case, we need to assign prior 
probability laws to the means 'Tij^., to the variances (t|^ or to the covariance 
matrices Sj^, and also to the covariance matrices of the noises of the likelihood 
functions. The conjugate priors for the means m^^ are in general the Gaussians 
A/'(mo , , (T^ ), those of variances cr? are the inverse Gammas JO(aQ, /Sq) and 
those for the covariance matrices Sj^ are the inverse Wishart's XyV(ao, ■'^o)- 

3.3.4 Expressions of likelihood, prior and posterior lav^^s 

We now have all the elements for writing the expressions of the posterior laws. 
We are going to summarizes them here: 

. Likelihood: 0) = Y{T=i PigH, S„) = Ilfii J^ig - /, ^.^) 

where we assumed that the noises are independent, centered and Gaus- 
sian with covariance matrices which, hereafter, are also assumed to 
be diagonal S^j = (Tef/. 



24 



• HMM for the images: p{f\z,e) = Uj=iPifj\^j,'mj, Sj) 

where we used z_ = {zj, j = 1, • • • , N} and where we assumed that fj\zj 
are independent. 

• PMRF for the labels: cx HjLi exp ct.J2rGn^s<£V{r) ^('^ji''') ^ -^ji^)) 
where wc used the simplified notation p{zj) = P{Zj{r) = z{r),r e TZ) 
and where we assumed {zj, j ~ 1, • • ■ , N} are independent. 

• Conjugate priors for the hyperparameters: 

• Joint posterior law oi f, z_ and 

p{f,z, e\g) « p{g\f, e,) p{f\z, 02) p{z\e2) p{e) 



3.4 Bayesian estimators and computational methods 

The expression of this joint posterior law is, in general, known upto a normali- 
sation factor. This means that, if wc consider the Joint Maximum A Posteriori 
(JMAP) estimate 

QJl) = arg max {p{f,z,e\g)} (91) 

iLz^O) 

we need a global optimization algorithm, but if we consider the Minimum Mean 
Square Estimator (MMSE) or equivalently the Posterior Mean (PM) estimates, 
then we need to compute this factor which needs huge dimentional integrations. 
There are however three main approaches to do Bayesian computation: 

• Laplace approximation: When the posterior law is unimodale, it is rea- 
sonable to approximate it with an equivalent Gaussian which allows then 
to do all computations analytically. Unfortunately, very often, p{f,z_,6\g) 
as a function of / only may be Gaussian, but as a function of z or 6 is 
not. So, in general, this approximation method can not be used for all 
variables. 

• Variational and mean field approximation: The main idea behind this 
approach is to approximate the joint posterior pif^z^Olg) with another 
simpler distribution q{f,z,6\g) for which the computations can be done. 
A first step simpler distribution q{f,z,0\g) is a separable ones: 

q{Lz,e\g)^q,il)q2{z)q3ie) (92) 

In this way, at least reduces the integration computations to the product 
of three separate ones. This process can again be applied to any of these 
three distributions, for example qi{f) = Yljqijifj)- With the Gaussian 
mixture modeling we proposed, qi{f) can be choosed to be Gaussian, 
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92(2) to be separated to two parts qiB{z_) and qiw{^) where the pixels of 
the images are separated in two classes B and W as in a checker board. 
This is thanks the properties of the proposed Potts-Markov model with 
the four nearest neighborhood which gives the possibility to use qisiz) 
and qiw{z) separately. For qsid) very often we also choose a separable 
distribution which use the conjugate properties of the prior distributions. 

• Markov Chain Monte Carlo (MCMC) sampling which gives the possibily 
to explore the joint posterior law and compute the necessary posterior 
mean estimates. In our case, we propose the general MCMC Gibbs sam- 
pling algorithm to estimate /, z and by first separating the unknowns 
in two sets p{f , z\6_, g) and p{6]f,z_,g). Then, we separate again the first 
set in two subsets p{f\z_,6_,g) and p{z\6_,g). Finally, when possible, us- 
ing the separability along the channels, separate these two last terms in 
p{fj\zj,9j,gj) and p{zj\9j,gj). The general scheme is then, using these 

expressions, to generates samples f^^\z^^\6!^^^^ from the joint posterior 
law p{f,z_,9\g) and after the convergence of the Gibbs samplers, to com- 
pute their mean and to use them as the posterior estimates. 

In this paper we are not going to detail these methods. However, in the 
following we propos to examine some particular cases through a few case studies 
in relation to image restoration, image fusion and joint segmentation, blind 
image separation. 

4 Case studies 

4.1 Single channel image denoising and restoration 

The simplest example of inversion is a single channel image denoising and 
restoration when the PSF of the imaging system is given. The forward model 
for this problem is 



where the denoising case corresponds to the case where h{r) ~ S{r) and H = I. 
Assuming the noise to be centered, white and Gaussian with known variance 



g{r)~ h{r) * f {r) + e{r) , r G 7^ or g~Hf + e 



(93) 




(94) 



p{f{r)\z{r) - fc) = M{mk,<jl), k = \,---,K 



(95) 




(96) 



ren sev(r) 
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where 



/fe= {/(r) : r G 7^fe}, 7^fe = {r : z{r) = k) 
p{fMr) = k)= AA(mafc,Sfc) with = a^/^ 

nfcAA(mafc,Sfc) =A/'(m„S,) with 
m^= [mil'i, • • • ,TO/fl'j^]' and S2 = diag [Si, • ■ • , S^] 

and the posterior probabihty laws wc need to implement an MCMC like algo- 
rithm are: 

with Ts^{H^T.e'^H + (98) 



and / = S (iJ'Se^^g + S 



p[z\g,e)(x p{g\z,e)p{z) where 

pig\z,e)= J\f{Hm,,'Eg) with Hg = Hll^H^ 

and the posterior probabilities of the hyperparameters are: 



(99) 



p{mk\z,f) =Miiik,vl) with vl = ( ^ + ^ ) and pifc = u^, ( + -"T^ 

p(crfc|/, 2) =IQ{ak,(3k) with afe = afeg + ^ and /3fe = /^^q + ^^^y^ 

where = — /^(r) and = (/(r) - rrife)^ 

77 1 
p(a,2|/,g)^jg(«^/3^) witha^--+a^ and = - |lg - if + 

Uk = number of pixels in Rk and n = total number of pixels. 

Here, we show two examples of simulations: the first in relation with image 
denoising and the second in relation with image deconvolution. In both cases, 
we have choosed the same input image f{r). In the first case, we only has added 
a Gaussian noise and in the second case, we first blureed it with box car PSF of 
size 7x7 pixels and added a Gaussian noise. Fig. 1131 shows the original image, 
its contours and its regions. Fig. [T3] shows the observed noisy image and the 
results obtained by the proposed method. Remember that, in this method, we 
have also the estimated contours and region labels as byproducts. Fig. [TSl shows 
the observed blurred and noisy image and the results obtained by the proposed 
restoration method. 

For other inverse problems which can be modeled as a SISO model and where 
such Bayesian approach has been used refer to [52] 
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f{r) z{r) q{r) 



Figure 13: Original image, its contour and its region labels used for image 
denoising and image restoration. 




Figure 14: Observed noisy image and the results of the proposed denoising 
method. 




Figure 15: Observed noisy image and the results of the proposed restoration 
method. 
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4.2 Registered images fusion and joint segmentation 

Here, each observed image gi{r) (or equivalently g^) is assumed to be a noisy 
version of the unobserved real image fi{r) (or equivalently /j) 

(7,(r) = /,(r) + e,(r), r e 7^, or = /, + e„ i = l,---,M (100) 

which gives 

p(9,l/.)=W^,Se,) with Se, (101) 

and 

Pi9\l)^llp(9^\f^) with (102) 

i 

and all the unobserved real images fi{r), i — 1, ■ ■ ■ , M are assumed to have a 
common segmentation z(r) (or equivalently z) which is modeled by a discrete 
value Potts Random Markov Field (PRMF) . Then, using the same notations as 
in previous case, we have the following relations: 

p(/,(r)|z(r) = fc)= J\f{m,k,crf,^), k^l, ■■■,!< 

fik = {Mr) ■■ r e 7^,}, = {r : z(r) = k} 

Pif^k\<r) = k) = A/'(m,fclfc,Sifc) with S^fe = (T^^/fc 

p(z) = p{z{r),r en) <xc^p aJ2re'RT,seV(r)Hzir) - z{s)) 

pifi\z) = Afim^i^-S^i) with 

rrizi = [TOiil'i, ■ • ■ , m^^l'^]' and = diag [S^, ■ • ■ , Sj^^] 

PW = U^Pif^\^) 

and all the conditional and posterior probability laws wee need to implement 
the proposed Bayesian methods are summarized here: 



p{f,\z,e,,g,) = AA(/„S,) 

with = (Se-i +S,-i)-i and /, = (Serig^ + S^r^m, 

p{z\g,e) cx (n,p(g,N,0O) M^(r),r e7^) with 

p{g,\z,ei) = AA(m,„Eg^) with Eg^^ = S,, + Se, 

Pimik\fi,z,cjff^)= Af{fi^k,Vil) 

with ^,^^=,,2(-^ + !]i^) and 

Pi<^ik\fi^ ^) = ^_S{aik: Ptk) with ajfc 0,0 + ^ and /J^^, = + ^ 

where /i^ = Erei?^ Z*!'^) and = Erefl^ (/*(^) " ^^k) 

p{a,l\f,,g,) = IQ{a\,m with = f + afo and /3| = \\\g, -f,f + 

Tifc = number of pixels in i?^, 71 = total number of pixels. 

For more details on this model and its application in medical image fusion 
as well as in image fusion for security systems see [30l |3T| . 
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Figure 16: Image fusion and joint segmentation of two images from a security 
system measurement. 

4.3 Joint segmentation of hyper-spectral images 

The proposed model is the same as the model of the previous section except 
for the last equation of the forward model which assumes that the pixels in 
similar regions of different images are independent. For hyper-spectral images, 
this hypothesis is not valid and we have to account for their correlations. This 
work is under consideration. 

4.4 Segmentation of a video sequence of images 

Here, we can not assume that all the images in the video sequence have the 
same segmentation labels. However, we may use the segmentation obtained 
in an image as an initialization for the segmentation of next image. For more 
details on this model and to see a typical result see [?]. 

4.5 Joint segmentation and separation of instantaneous 
mixed images 

Here, the additional difficulty is that we also have to estimate the mixing matrix 
A. For more details on this model and to see some typical result in joint 
segmentation and separation of images see [28l 1321 IMl Ell [35l [36] . 

5 Conclusion 

In this paper we first showed that many image processing problems can be pre- 
sented as inverse problems by modeling the relation of the observed image to the 
unknown desired features explicitly. Then, we presented a very general forward 
modeling for the observations and a very general probabilistic modeling of im- 
ages through a hidden Markov modeling (HMM) which can be used as the main 
basis for many image processing problems such as: 1) simple or multi channel 
image restoration, 2) simple or joint image segmentation, 3) multi-sensor data 
and image fusion, 4) joint segmentation of color or hyper-spectral images and 
5) joint blind source separation (BSS) and segmentation. Finally, we presented 
detailed forward models, prior and posterior probability law expressions for the 
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implementation of MCMC algorithms for a few cases of those problems showing 
typical results which can be obtained using these methods. 
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